Goto

Collaborating Authors

 Nghệ An Province


On the grid-sampling limit SDE

Bender, Christian, Thuan, Nguyen Tran

arXiv.org Machine Learning

In our recent work [3] we introduced the grid-sampling SDE as a proxy for modeling exploration in continuous-time reinforcement learning. In this note, we provide further motivation for the use of this SDE and discuss its wellposedness in the presence of jumps.


Multi-Dialect Vietnamese: Task, Dataset, Baseline Models and Challenges

Van Dinh, Nguyen, Dang, Thanh Chi, Nguyen, Luan Thanh, Van Nguyen, Kiet

arXiv.org Artificial Intelligence

Vietnamese, a low-resource language, is typically categorized into three primary dialect groups that belong to Northern, Central, and Southern Vietnam. However, each province within these regions exhibits its own distinct pronunciation variations. Despite the existence of various speech recognition datasets, none of them has provided a fine-grained classification of the 63 dialects specific to individual provinces of Vietnam. To address this gap, we introduce Vietnamese Multi-Dialect (ViMD) dataset, a novel comprehensive dataset capturing the rich diversity of 63 provincial dialects spoken across Vietnam. Our dataset comprises 102.56 hours of audio, consisting of approximately 19,000 utterances, and the associated transcripts contain over 1.2 million words. To provide benchmarks and simultaneously demonstrate the challenges of our dataset, we fine-tune state-of-the-art pre-trained models for two downstream tasks: (1) Dialect identification and (2) Speech recognition. The empirical results suggest two implications including the influence of geographical factors on dialects, and the constraints of current approaches in speech recognition tasks involving multi-dialect speech data. Our dataset is available for research purposes.


An Embarrassingly Simple Approach to Enhance Transformer Performance in Genomic Selection for Crop Breeding

Chen, Renqi, Han, Wenwei, Zhang, Haohao, Su, Haoyang, Wang, Zhefan, Liu, Xiaolei, Jiang, Hao, Ouyang, Wanli, Dong, Nanqing

arXiv.org Artificial Intelligence

Genomic selection (GS), as a critical crop breeding strategy, plays a key role in enhancing food production and addressing the global hunger crisis. The predominant approaches in GS currently revolve around employing statistical methods for prediction. However, statistical methods often come with two main limitations: strong statistical priors and linear assumptions. A recent trend is to capture the non-linear relationships between markers by deep learning. However, as crop datasets are commonly long sequences with limited samples, the robustness of deep learning models, especially Transformers, remains a challenge. In this work, to unleash the unexplored potential of attention mechanism for the task of interest, we propose a simple yet effective Transformer-based framework that enables end-to-end training of the whole sequence. Via experiments on rice3k and wheat3k datasets, we show that, with simple tricks such as k-mer tokenization and random masking, Transformer can achieve overall superior performance against seminal methods on GS tasks of interest.


VietMed: A Dataset and Benchmark for Automatic Speech Recognition of Vietnamese in the Medical Domain

Le-Duc, Khai

arXiv.org Artificial Intelligence

In this work, we present VietMed - a Vietnamese speech recognition dataset in the medical domain comprising 16h of labeled medical speech, 1000h of unlabeled medical speech and 1200h of unlabeled general-domain speech. To our best knowledge, VietMed is by far the world's largest public medical speech recognition dataset in 7 aspects: total duration, number of speakers, diseases, recording conditions, speaker roles, unique medical terms and accents. VietMed is also by far the largest public Vietnamese speech dataset in terms of total duration. Additionally, we are the first to present a medical ASR dataset covering all ICD-10 disease groups and all accents within a country. Moreover, we release the first public large-scale pre-trained models for Vietnamese ASR, w2v2-Viet and XLSR-53-Viet, along with the first public large-scale fine-tuned models for medical ASR. Even without any medical data in unsupervised pre-training, our best pre-trained model XLSR-53-Viet generalizes very well to the medical domain by outperforming state-of-the-art XLSR-53, from 51.8% to 29.6% WER on test set (a relative reduction of more than 40%). All code, data and models are made publicly available here.


ViWikiFC: Fact-Checking for Vietnamese Wikipedia-Based Textual Knowledge Source

Le, Hung Tuan, To, Long Truong, Nguyen, Manh Trong, Van Nguyen, Kiet

arXiv.org Artificial Intelligence

Fact-checking is essential due to the explosion of misinformation in the media ecosystem. Although false information exists in every language and country, most research to solve the problem mainly concentrated on huge communities like English and Chinese. Low-resource languages like Vietnamese are necessary to explore corpora and models for fact verification. To bridge this gap, we construct ViWikiFC, the first manual annotated open-domain corpus for Vietnamese Wikipedia Fact Checking more than 20K claims generated by converting evidence sentences extracted from Wikipedia articles. We analyze our corpus through many linguistic aspects, from the new dependency rate, the new n-gram rate, and the new word rate. We conducted various experiments for Vietnamese fact-checking, including evidence retrieval and verdict prediction. BM25 and InfoXLM (Large) achieved the best results in two tasks, with BM25 achieving an accuracy of 88.30% for SUPPORTS, 86.93% for REFUTES, and only 56.67% for the NEI label in the evidence retrieval task, InfoXLM (Large) achieved an F1 score of 86.51%. Furthermore, we also conducted a pipeline approach, which only achieved a strict accuracy of 67.00% when using InfoXLM (Large) and BM25. These results demonstrate that our dataset is challenging for the Vietnamese language model in fact-checking tasks.


ViLLM-Eval: A Comprehensive Evaluation Suite for Vietnamese Large Language Models

Nguyen, Trong-Hieu, Le, Anh-Cuong, Nguyen, Viet-Cuong

arXiv.org Artificial Intelligence

Evaluation benchmarks play a pivotal role in the development of artificial intelligence (AI) systems. Traditionally, natural language processing (NLP) benchmarks have primarily focused on assessing specific and relatively straightforward abilities. However, the advent of large language models (LLMs), also known as foundation models, has brought about a paradigm shift. These powerful models have demonstrated a wide array of novel capabilities, prompting a redirection in the evaluation focus towards more general and intricate skills, such as comprehensive world knowledge and complex reasoning abilities. To align with the remarkable advancements in LLMs, new benchmarks have emerged to probe the diverse and multifaceted capabilities of these models. For instance, MMLU [8], HellaSwag [25], ARC [4], and TruthfulQA [10] are benchmark datasets that have garnered widespread recognition among researchers and are frequently employed on leaderboards to evaluate the performance of language models. However, these benchmarks are primarily tailored to the English language, resulting in a limited understanding of LLMs' capabilities in other languages, including Vietnamese. Despite the recent surge in powerful Vietnamese LLMs, such as Vistral-7B-Chat [12], PhoGPT-4B-Chat [13], and VinaLLaMA-7B-Chat [16], benchmarking these models on datasets translated from English to Vietnamese, even with perfect translations, cannot adequately assess the true quality of these language models concerning their knowledge about core interests of Vietnamese users.


Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems

Davis, Ernest, Aaronson, Scott

arXiv.org Artificial Intelligence

Our test sets were too small and too haphazard to support statistically valid conclusions, but they were suggestive of a number of conclusions. We summarize these here, and discuss them at greater length in section 7. Over the kinds of problems tested, GPT-4 with either plug-in is significantly stronger than GPT-4 by itself, or, almost certainly, than any AI that existed a year ago. However it is still far from reliable; it often outputs a wrong answer or fails to output any answer. In terms of overall score, we would judge that these systems performs on the level of a middling undergraduate student. However, their capacities and weaknesses do not align with a human student; the systems solve some problems that even capable students would find challenging, whereas they fail on some problems that even middling high school students would find easy.